library(tidyverse)
library(readxl)
library(cowplot)
library(ape)
library(nlme)
library(MCMCglmm)
# This option turn on cacheing of chunks. This will dramatically
# speed up knitting, because only chunks that have changed will
# be recompiled.
knitr::opts_chunk$set(cache = TRUE)
Packages that you will need for this problem set that you might not have installed yet:
apeMCMCglmmGo ahead and install them now. Also load the nlme package, which has the gls() function.
At the Conard Environmental Research Area, there are 20 experimental plots with 10 burned each fall and 10 unburned. The following data are grass species richness data from 5 separate, hapazardly chosen, 10 meter^2 plots within each of these 20 experimental plots. The hypothesis to be tested is that burning increases species richness.
set.seed(112)
mms<-c(rnorm(10, 5.5, 2),rnorm(10,6.5,2))
all.rich<-numeric(0)
for(ii in mms){
all.rich<-c(all.rich,as.integer(rnorm(5,ii,1)))
}
rich.dat<-data.frame('plot'=as.factor(rep(seq(1,20),each=5)),
'treatment'= rep(c('burned','unburned'),each=50),
'richness'=all.rich)
write.table(rich.dat, file="../data/richness.txt", sep="\t", row.names=FALSE)
Load in the richness data (richness.txt). Plot species richness vs. treatment. Include an appropriate amount of jitter and transparency and color points by plot.
rich.dat <- read.table("../data/richness.txt",sep="\t",header=TRUE, stringsAsFactors = FALSE)
ggplot(rich.dat, aes(x=treatment, y=richness, color=as.factor(plot))) +
geom_point(position = position_jitter(0.1), alpha=1/2)
Do you think there is a relationship between species richness and burning?
Probably not. It doesn't look like a strong relationship.
First, fit a model predicting species richness from treatment without considering plot membership. What can you conclude from the results of this model?
It is difficult to interpret without considering plot as there is pseudoreplication.
anova(lm(richness~treatment, data=rich.dat))
## Analysis of Variance Table
##
## Response: richness
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 1 32.49 32.49 6.0613 0.01556 *
## Residuals 98 525.30 5.36
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now, get the mean species richness value for each plot. Fit a model predicting mean species richness from treatment. What can you conclude from this model?
mean.plot<-rich.dat %>% group_by(plot,treatment) %>% summarize(meanrich=mean(richness))
anova(lm(meanrich~treatment, data=mean.plot))
## Analysis of Variance Table
##
## Response: meanrich
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 1 6.498 6.498 1.297 0.2697
## Residuals 18 90.180 5.010
There is no effect of treatment on species richness.
Use the lme() function to fit a model predicting richness from treatment with plot as a random effect.
summary(lme(richness ~ treatment, random=~1|plot, data=rich.dat))
## Linear mixed-effects model fit by REML
## Data: rich.dat
## AIC BIC logLik
## 346.1061 356.4459 -169.053
##
## Random effects:
## Formula: ~1 | plot
## (Intercept) Residual
## StdDev: 2.196361 0.9643651
##
## Fixed effects: richness ~ treatment
## Value Std.Error DF t-value p-value
## (Intercept) 5.32 0.7078135 80 7.516104 0.0000
## treatmentunburned 1.14 1.0009995 18 1.138862 0.2697
## Correlation:
## (Intr)
## treatmentunburned -0.707
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.76767996 -0.53978614 0.05928632 0.58136355 2.26127497
##
## Number of Observations: 100
## Number of Groups: 20
Compare your results to the approach using the mean value. What can you conclude from this model?
The results are consistent with the model using the mean values.
These 20 plots exist in an alternating pattern going from south to north on either side of an access road. What other source of non-independence should you consider? How would you account for this factor?
The spatial relationship between plots needs to be accounted for. You could include a continuous variable coding for position along the south to north axis and/or side of the road.
For this activity, we will use a dataset measuring the amount of phenoloxidase (PO) produced by indian meal moth caterpillars in full-sibling families.
First we will load the MCMCglmm library and get the moth data called PlodiaPO.
library(MCMCglmm)
data(PlodiaPO)
Examine the data. Look at the first few rows and use unique() to see the list of families in this dataset. Plot a histogram of the PO values and calculate the overall mean value.
head(PlodiaPO)
## FSfamily PO plate
## 1 F1 0.6685459 1
## 2 F1 0.8024980 1
## 3 F1 0.8233658 1
## 4 F1 0.7696421 1
## 5 F1 0.9437393 1
## 6 F1 0.8923773 1
unique(PlodiaPO$FSfamily)
## [1] F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F14 F16 F17 F18 F19
## [18] F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 F31 F32 F33 F34 F35 F36
## [35] F37 F38 F39 F40 F41 F42 F43 F44 F45 F46 F47 F48 F49 F50
## 48 Levels: F1 F10 F11 F12 F14 F16 F17 F18 F19 F2 F20 F21 F22 F23 ... F9
ggplot(PlodiaPO, aes(x=PO)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mean(PlodiaPO$PO)
## [1] 1.168576
Now we will fit the MCMCglmm model with family as a random effect.
fm <- MCMCglmm(PO ~ 1, random = ~ FSfamily, data = PlodiaPO, verbose = TRUE)
##
## MCMC iteration = 0
##
## MCMC iteration = 1000
##
## MCMC iteration = 2000
##
## MCMC iteration = 3000
##
## MCMC iteration = 4000
##
## MCMC iteration = 5000
##
## MCMC iteration = 6000
##
## MCMC iteration = 7000
##
## MCMC iteration = 8000
##
## MCMC iteration = 9000
##
## MCMC iteration = 10000
##
## MCMC iteration = 11000
##
## MCMC iteration = 12000
##
## MCMC iteration = 13000
Refer to the lecture slides for the next section. Plot the fixed effects and calculate the median for the estimate of the intercept. Compare this value to the mean value you calculated above.
plot(fm$Sol)
mean(PlodiaPO$PO)
## [1] 1.168576
median(fm$Sol)
## [1] 1.163992
Now plot the random effects and calculate heritability for all the MCMC samples. Because we have fit family here instead of using an animal model like in lecture, you should multiply your estimate by 2. This is because full-sib families share 50% of their genes on average. Plot the resulting distribution of heritability estimates. Estimate the median value and the standard error.
plot(fm$VCV)
h2 <- 2 * fm$VCV[, "FSfamily"]/(fm$VCV[, "FSfamily"] +
+ fm$VCV[, "units"])
plot(h2)
median(h2)
## [1] 0.4403454
sd(h2)
## [1] 0.1087081
We are going to work with two files, one containing a tree of (extant) primates (226 tips; Primate_Masses_Tree.nex) and another containing data of primate life history (Primate_Masses.xlsx). We will use these data to explore phylogenetic comparative methods.
Primate_Masses.xlsx file and explore the contents. You will find columns:Order: Traditional Linnean orderFamily: Traditional Linnean familyBinomial: Genus species binomial. This column matches the tree tip labels.AdultBodyMass_g: Body mass in gramsGestationLen_d: Gestation length in daysHomeRange_km2: Home range area in square kilometersMaxLongevity_m: Maximum longevity in monthsSocialGroupSize: Mean social group sizeAs a final step, be sure to convert the object into a data.frame from the tibble that read_excel() produces by default.
# FIXME
M <- read_excel("../data/Primate_Masses.xlsx") %>% as.data.frame()
Load the tree from the file Primate_Masses_Tree.nex using the function read.nexus() which is in the ape package.
Give it is useful name like tree. Plot the tree using the plot() method for phylo objects (which is what your tree will be). In the plot() call, include you can make the tip labels smaller with cex = 0.8. We expanded the figure size in the chunk header to keep the tips from overlapping. It will look like a mess in the RStudio window, but will look OK when knitted.
# FIXME
tree <- read.nexus("../data/Primate_Masses_Tree.nex")
plot(tree, cex = 0.8)
Any time you have comparative data and a tree, you should do a lot of tests to make sure that the tree labels match the data column that codes for them. Some functions will do these checks for you. However, it's important to do these checks yourself, just to be sure. They are absolutely necessary when you are using phylogenetically independent contrasts or another method that makes assumptions about a match between tips and data without an explicit check. Using gls() for PGLS will do this also, albeit with a warning message.
Start by looking at both the Binomial column of your data frame and the tip labels of the tree (e.g., tree$tip.label).
# FIXME
M$Binomial
## [1] "Ateles_belzebuth" "Ateles_geoffroyi"
## [3] "Ateles_paniscus" "Callicebus_moloch"
## [5] "Callimico_goeldii" "Callithrix_jacchus"
## [7] "Callithrix_pygmaea" "Cercocebus_galeritus"
## [9] "Cercopithecus_ascanius" "Cercopithecus_campbelli"
## [11] "Cercopithecus_cephus" "Cercopithecus_mitis"
## [13] "Cercopithecus_neglectus" "Cercopithecus_nictitans"
## [15] "Cercopithecus_pogonias" "Erythrocebus_patas"
## [17] "Lagothrix_lagotricha" "Eulemur_coronatus"
## [19] "Eulemur_fulvus" "Eulemur_macaco_macaco"
## [21] "Eulemur_mongoz" "Chiropotes_albinasus"
## [23] "Chiropotes_satanas" "Cebus_albifrons"
## [25] "Cebus_apella" "Cebus_capucinus"
## [27] "Cheirogaleus_major" "Cheirogaleus_medius"
## [29] "Colobus_guereza" "Colobus_polykomos"
## [31] "Daubentonia_madagascariensis" "Hapalemur_griseus_griseus"
## [33] "Galago_alleni" "Galago_demidoff"
## [35] "Galago_moholi" "Galago_senegalensis"
## [37] "Gorilla_gorilla_gorilla" "Hylobates_lar"
## [39] "Hylobates_pileatus" "Miopithecus_talapoin"
## [41] "Mirza_coquereli" "Lemur_catta"
## [43] "Leontopithecus_rosalia" "Lepilemur_leucopus"
## [45] "Lepilemur_mustelinus" "Lophocebus_albigena"
## [47] "Loris_tardigradus" "Macaca_fascicularis"
## [49] "Macaca_fuscata" "Macaca_mulatta"
## [51] "Macaca_nemestrina" "Macaca_radiata"
## [53] "Macaca_silenus" "Macaca_sinica"
## [55] "Macaca_sylvanus" "Mandrillus_sphinx"
## [57] "Microcebus_murinus" "Microcebus_rufus"
## [59] "Nasalis_larvatus" "Nomascus_concolor"
## [61] "Otolemur_crassicaudatus" "Otolemur_garnettii"
## [63] "Pan_paniscus" "Pan_troglodytes_troglodytes"
## [65] "Papio_anubis" "Papio_cynocephalus"
## [67] "Papio_hamadryas" "Papio_ursinus"
## [69] "Perodicticus_potto" "Phaner_furcifer"
## [71] "Prolemur_simus" "Propithecus_verreauxi"
## [73] "Pithecia_pithecia" "Pongo_pygmaeus_pygmaeus"
## [75] "Saguinus_fuscicollis" "Saguinus_midas"
## [77] "Saguinus_oedipus" "Saimiri_sciureus"
## [79] "Semnopithecus_entellus" "Tarsius_bancanus"
## [81] "Symphalangus_syndactylus" "Tarsius_syrichta"
## [83] "Alouatta_palliata" "Alouatta_pigra"
## [85] "Alouatta_seniculus" "Aotus_trivirgatus"
## [87] "Tarsius_tarsier" "Theropithecus_gelada"
## [89] "Trachypithecus_obscurus" "Trachypithecus_vetulus"
## [91] "Varecia_variegata_variegata"
tree$tip.label
## [1] "Allenopithecus_nigroviridis"
## [2] "Cercopithecus_ascanius"
## [3] "Cercopithecus_cephus"
## [4] "Cercopithecus_cephus_cephus"
## [5] "Cercopithecus_cephus_ngottoensis"
## [6] "Cercopithecus_diana"
## [7] "Cercopithecus_erythrogaster_erythrogaster"
## [8] "Cercopithecus_erythrotis"
## [9] "Cercopithecus_hamlyni"
## [10] "Cercopithecus_lhoesti"
## [11] "Cercopithecus_lowei"
## [12] "Cercopithecus_mitis"
## [13] "Cercopithecus_mona"
## [14] "Cercopithecus_neglectus"
## [15] "Cercopithecus_nictitans"
## [16] "Cercopithecus_petaurista"
## [17] "Cercopithecus_preussi"
## [18] "Cercopithecus_solatus"
## [19] "Cercopithecus_wolfi"
## [20] "Chlorocebus_aethiops"
## [21] "Chlorocebus_pygerythrus"
## [22] "Chlorocebus_sabaeus"
## [23] "Chlorocebus_tantalus"
## [24] "Erythrocebus_patas"
## [25] "Miopithecus_talapoin"
## [26] "Allocebus_trichotis"
## [27] "Avahi_laniger"
## [28] "Avahi_occidentalis"
## [29] "Cheirogaleus_crossleyi"
## [30] "Cheirogaleus_major"
## [31] "Cheirogaleus_medius"
## [32] "Daubentonia_madagascariensis"
## [33] "Eulemur_albifrons"
## [34] "Eulemur_albocollaris"
## [35] "Eulemur_collaris"
## [36] "Eulemur_coronatus"
## [37] "Eulemur_fulvus"
## [38] "Eulemur_macaco_flavifrons"
## [39] "Eulemur_macaco_macaco"
## [40] "Eulemur_mongoz"
## [41] "Eulemur_rubriventer"
## [42] "Eulemur_rufus"
## [43] "Eulemur_sanfordi"
## [44] "Hapalemur_alaotrensis"
## [45] "Hapalemur_aureus"
## [46] "Hapalemur_griseus_griseus"
## [47] "Hapalemur_griseus_meridionalis"
## [48] "Hapalemur_occidentalis"
## [49] "Indri_indri"
## [50] "Lemur_catta"
## [51] "Lepilemur_aeeclis"
## [52] "Lepilemur_ankaranensis"
## [53] "Lepilemur_dorsalis"
## [54] "Lepilemur_edwardsi"
## [55] "Lepilemur_leucopus"
## [56] "Lepilemur_microdon"
## [57] "Lepilemur_mitsinjoensis"
## [58] "Lepilemur_mustelinus"
## [59] "Lepilemur_randrianasoli"
## [60] "Lepilemur_ruficaudatus"
## [61] "Lepilemur_sahamalazensis"
## [62] "Lepilemur_seali"
## [63] "Lepilemur_septentrionalis"
## [64] "Microcebus_berthae"
## [65] "Microcebus_bongolavensis"
## [66] "Microcebus_danfossi"
## [67] "Microcebus_griseorufus"
## [68] "Microcebus_jollyae"
## [69] "Microcebus_lehilahytsara"
## [70] "Microcebus_lokobensis"
## [71] "Microcebus_mittermeieri"
## [72] "Microcebus_murinus"
## [73] "Microcebus_myoxinus"
## [74] "Microcebus_ravelobensis"
## [75] "Microcebus_rufus"
## [76] "Microcebus_sambiranensis"
## [77] "Microcebus_simmonsi"
## [78] "Microcebus_tavaratra"
## [79] "Mirza_coquereli"
## [80] "Prolemur_simus"
## [81] "Propithecus_coquereli"
## [82] "Propithecus_diadema"
## [83] "Propithecus_edwardsi"
## [84] "Propithecus_tattersalli"
## [85] "Propithecus_verreauxi"
## [86] "Varecia_rubra"
## [87] "Varecia_variegata_variegata"
## [88] "Alouatta_caraya"
## [89] "Alouatta_palliata"
## [90] "Alouatta_pigra"
## [91] "Alouatta_sara"
## [92] "Alouatta_seniculus"
## [93] "Ateles_belzebuth"
## [94] "Ateles_fusciceps"
## [95] "Ateles_geoffroyi"
## [96] "Ateles_geoffroyi_ornatus"
## [97] "Ateles_geoffroyi_vellerosus"
## [98] "Ateles_paniscus"
## [99] "Brachyteles_arachnoides"
## [100] "Lagothrix_lagotricha"
## [101] "Aotus_azarae"
## [102] "Aotus_azarae_infulatus"
## [103] "Aotus_lemurinus_griseimembra"
## [104] "Aotus_nancymaae"
## [105] "Aotus_trivirgatus"
## [106] "Callimico_goeldii"
## [107] "Callithrix_(Mico)_emiliae"
## [108] "Callithrix_argentata"
## [109] "Callithrix_aurita"
## [110] "Callithrix_geoffroyi"
## [111] "Callithrix_humeralifera"
## [112] "Callithrix_jacchus"
## [113] "Callithrix_kuhli"
## [114] "Callithrix_penicillata"
## [115] "Callithrix_pygmaea"
## [116] "Cebus_albifrons"
## [117] "Cebus_apella"
## [118] "Cebus_capucinus"
## [119] "Leontopithecus_chrysomelas"
## [120] "Leontopithecus_chrysopygus"
## [121] "Leontopithecus_rosalia"
## [122] "Saguinus_fuscicollis"
## [123] "Saguinus_geoffroyi"
## [124] "Saguinus_imperator"
## [125] "Saguinus_midas"
## [126] "Saguinus_oedipus"
## [127] "Saimiri_boliviensis_boliviensis"
## [128] "Saimiri_oerstedii"
## [129] "Saimiri_sciureus"
## [130] "Arctocebus_aureus"
## [131] "Arctocebus_calabarensis"
## [132] "Loris_lydekkerianus_grandis"
## [133] "Loris_lydekkerianus_malabaricus"
## [134] "Loris_tardigradus"
## [135] "Nycticebus_coucang"
## [136] "Nycticebus_pygmaeus"
## [137] "Perodicticus_potto"
## [138] "Bunopithecus_hoolock"
## [139] "Gorilla_gorilla_gorilla"
## [140] "Homo_sapiens"
## [141] "Hylobates_agilis"
## [142] "Hylobates_klossii"
## [143] "Hylobates_lar"
## [144] "Hylobates_moloch"
## [145] "Hylobates_muelleri"
## [146] "Hylobates_pileatus"
## [147] "Nomascus_concolor"
## [148] "Nomascus_gabriellae"
## [149] "Nomascus_leucogenys"
## [150] "Pan_paniscus"
## [151] "Pan_troglodytes_schweinfurthii"
## [152] "Pan_troglodytes_troglodytes"
## [153] "Pan_troglodytes_verus"
## [154] "Pongo_abelii"
## [155] "Pongo_pygmaeus_pygmaeus"
## [156] "Symphalangus_syndactylus"
## [157] "Callicebus_donacophilus"
## [158] "Callicebus_moloch"
## [159] "Cercocebus_agilis"
## [160] "Cercocebus_atys"
## [161] "Cercocebus_galeritus"
## [162] "Cercocebus_torquatus"
## [163] "Lophocebus_albigena"
## [164] "Lophocebus_aterrimus"
## [165] "Macaca_arctoides"
## [166] "Macaca_assamensis"
## [167] "Macaca_cyclopis"
## [168] "Macaca_fascicularis"
## [169] "Macaca_fuscata"
## [170] "Macaca_hecki"
## [171] "Macaca_leonina"
## [172] "Macaca_maura"
## [173] "Macaca_mulatta"
## [174] "Macaca_nemestrina"
## [175] "Macaca_nigra"
## [176] "Macaca_nigrescens"
## [177] "Macaca_ochreata"
## [178] "Macaca_ochreata_brunnescens"
## [179] "Macaca_pagensis"
## [180] "Macaca_radiata"
## [181] "Macaca_siberu"
## [182] "Macaca_silenus"
## [183] "Macaca_sinica"
## [184] "Macaca_sylvanus"
## [185] "Macaca_thibetana"
## [186] "Macaca_tonkeana"
## [187] "Mandrillus_leucophaeus"
## [188] "Mandrillus_sphinx"
## [189] "Papio_anubis"
## [190] "Papio_cynocephalus"
## [191] "Papio_hamadryas"
## [192] "Papio_papio"
## [193] "Papio_ursinus"
## [194] "Rungwecebus_kipunji"
## [195] "Theropithecus_gelada"
## [196] "Colobus_angolensis"
## [197] "Colobus_guereza"
## [198] "Colobus_polykomos"
## [199] "Nasalis_larvatus"
## [200] "Piliocolobus_badius"
## [201] "Presbytis_melalophos"
## [202] "Pygathrix_nemaeus"
## [203] "Rhinopithecus_avunculus"
## [204] "Rhinopithecus_bieti"
## [205] "Rhinopithecus_brelichi"
## [206] "Rhinopithecus_roxellana"
## [207] "Semnopithecus_entellus"
## [208] "Trachypithecus_(Trachypithecus)_auratus"
## [209] "Trachypithecus_(Trachypithecus)_poliocephalus"
## [210] "Trachypithecus_cristatus"
## [211] "Trachypithecus_francoisi"
## [212] "Trachypithecus_johnii"
## [213] "Trachypithecus_obscurus"
## [214] "Trachypithecus_phayrei"
## [215] "Trachypithecus_pileatus"
## [216] "Euoticus_elegantulus"
## [217] "Galago_alleni"
## [218] "Galago_demidoff"
## [219] "Galago_gallarum"
## [220] "Galago_moholi"
## [221] "Galago_senegalensis"
## [222] "Galago_zanzibaricus"
## [223] "Otolemur_crassicaudatus"
## [224] "Otolemur_garnettii"
## [225] "Tarsius_bancanus"
## [226] "Tarsius_syrichta"
Let's figure out what tree tips are not represented in the data file (a lot probably) and what data are not represented in the tree (hopefully only a few). Let's start by looking at the species that are present in both the data and the tree. We use the intersection of the Binomial column and the tip labels (the function intersect()). Change eval to true in the following chunk.
# FIXME
intersect(tree$tip.label, M$Binomial)
## [1] "Cercopithecus_ascanius" "Cercopithecus_cephus"
## [3] "Cercopithecus_mitis" "Cercopithecus_neglectus"
## [5] "Cercopithecus_nictitans" "Erythrocebus_patas"
## [7] "Miopithecus_talapoin" "Cheirogaleus_major"
## [9] "Cheirogaleus_medius" "Daubentonia_madagascariensis"
## [11] "Eulemur_coronatus" "Eulemur_fulvus"
## [13] "Eulemur_macaco_macaco" "Eulemur_mongoz"
## [15] "Hapalemur_griseus_griseus" "Lemur_catta"
## [17] "Lepilemur_leucopus" "Lepilemur_mustelinus"
## [19] "Microcebus_murinus" "Microcebus_rufus"
## [21] "Mirza_coquereli" "Prolemur_simus"
## [23] "Propithecus_verreauxi" "Varecia_variegata_variegata"
## [25] "Alouatta_palliata" "Alouatta_pigra"
## [27] "Alouatta_seniculus" "Ateles_belzebuth"
## [29] "Ateles_geoffroyi" "Ateles_paniscus"
## [31] "Lagothrix_lagotricha" "Aotus_trivirgatus"
## [33] "Callimico_goeldii" "Callithrix_jacchus"
## [35] "Callithrix_pygmaea" "Cebus_albifrons"
## [37] "Cebus_apella" "Cebus_capucinus"
## [39] "Leontopithecus_rosalia" "Saguinus_fuscicollis"
## [41] "Saguinus_midas" "Saguinus_oedipus"
## [43] "Saimiri_sciureus" "Loris_tardigradus"
## [45] "Perodicticus_potto" "Gorilla_gorilla_gorilla"
## [47] "Hylobates_lar" "Hylobates_pileatus"
## [49] "Nomascus_concolor" "Pan_paniscus"
## [51] "Pan_troglodytes_troglodytes" "Pongo_pygmaeus_pygmaeus"
## [53] "Symphalangus_syndactylus" "Callicebus_moloch"
## [55] "Cercocebus_galeritus" "Lophocebus_albigena"
## [57] "Macaca_fascicularis" "Macaca_fuscata"
## [59] "Macaca_mulatta" "Macaca_nemestrina"
## [61] "Macaca_radiata" "Macaca_silenus"
## [63] "Macaca_sinica" "Macaca_sylvanus"
## [65] "Mandrillus_sphinx" "Papio_anubis"
## [67] "Papio_cynocephalus" "Papio_hamadryas"
## [69] "Papio_ursinus" "Theropithecus_gelada"
## [71] "Colobus_guereza" "Colobus_polykomos"
## [73] "Nasalis_larvatus" "Semnopithecus_entellus"
## [75] "Trachypithecus_obscurus" "Galago_alleni"
## [77] "Galago_demidoff" "Galago_moholi"
## [79] "Galago_senegalensis" "Otolemur_crassicaudatus"
## [81] "Otolemur_garnettii" "Tarsius_bancanus"
## [83] "Tarsius_syrichta"
How many species are present in both? How many rows of data are there? How many tips or taxa need to be dropped in total (we don't know yet which are tips are which are taxa)?
83 are present in both. There are 91 data observations, so 8 must not match up.
The first step in the process is to keep only the tips of the tree where tips are present in the data. The setdiff() function returns a vector of the differences between two sets. If we use the same syntax as above but with setdiff rather than intersect, we'll get the list of tips that need to be dropped. Create that variable and call it something memorable like tips_to_drop. There should be 143 tips in the list.
tips_to_drop <- setdiff(tree$tip.label, M$Binomial)
length(tips_to_drop)
## [1] 143
Now we can actually drop the tips from the tree. The ape package has a function drop.tip() which does just this. Create a new tree (e.g., tree_pruned) where all the orphan tips are dropped. Look at the help for drop.tip() for the syntax. Then plot your tree to make sure it still looks OK. We we use this tree for the rest of this exercise.
# FIXME
tree_pruned <- drop.tip(tree, tips_to_drop)
plot(tree_pruned)
We should probably also remove any rows from the data that are not present in the tree. This isn't strictly necessary in most cases, but it's good practice. We can use setdiff() again, but this time reverse the order: setdiff(M$Binomial, tree_pruned$tip.label). Do this, and save to a new variable rows_to_drop. You should find 8 rows that need to be dropped from the data frame.
# FIXME
rows_to_drop <- setdiff(M$Binomial, tree_pruned$tip.label)
length(rows_to_drop)
## [1] 8
Drop the rows from the data that are not in the tree. The best (though least clear) way to do this is with a notation like M[!(M$Binomial %in% rows_to_drop), ]. The ! negates the entire parenthetical that follows. Inside parentheses, the function %in% returns true for all the rows in M$Binomial that match any items in rows_to_drop. So by negating this set, we get all the rows that are not in rows_to_drop.
Assign a new variable the data frame resulting from dropping all rows not in the tree file. Check the number of rows. You should find 83.
# FIXME
M_pruned <- M[!(M$Binomial %in% rows_to_drop), ]
nrow(M_pruned)
## [1] 83
The final thing to do is to assign Binomial to the row names (row.names()) of the data.frame.
# FIXME
row.names(M_pruned) <- M_pruned$Binomial
We have finally arrived at the point of being able to do something with these data. As you may have realized, working with comparative data is much more involved in terms of processing than working with regular data.
For the rest of this question, we will be exploring the relationship between body mass and maximum longevity. Make a scatterplot of the raw data for maximum longevity vs. body mass.
# FIXME
ggplot(M_pruned, aes(x = AdultBodyMass_g, y = MaxLongevity_m)) +
geom_point()
Assess the bivariate relationship and transform either or both of the variables if they require it.
# FIXME
M_pruned$log_Mass <- log10(M_pruned$AdultBodyMass_g)
M_pruned$log_Longevity <- log10(M_pruned$MaxLongevity_m)
ggplot(M_pruned, aes(x = log_Mass, y = log_Longevity, color = Family)) +
geom_point()
We can log10 transform both variables. It's probably adequate to only log mass, but it looks a little better with both transformed.
We're ready to perform PGLS. Fit a PGLS using gls() regressing longevity on mass. Use a Brownian motion correlation structure. Follow the lecture slides as an example. Save this model to an R object. Print the summary of the gls object that you just made.
# FIXME
fm1 <- gls(log_Longevity ~ log_Mass, data = M_pruned,
correlation = corBrownian(phy = tree_pruned),
method = "ML")
summary(fm1)
## Generalized least squares fit by maximum likelihood
## Model: log_Longevity ~ log_Mass
## Data: M_pruned
## AIC BIC logLik
## -67.53574 -60.27921 36.76787
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 2.073902 0.22444810 9.240005 0.000
## log_Mass 0.107660 0.05798006 1.856845 0.067
##
## Correlation:
## (Intr)
## log_Mass -0.767
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.0658018 -0.1212565 0.1200731 0.4195799 1.0937989
##
## Residual standard error: 0.3379233
## Degrees of freedom: 83 total; 81 residual
log Longevity = 2.07 + 0.11 * log Mass.
Fit a second model where Family is used in addition to mass. Use an additive model and Brownian motion model of trait evolution. Use the anova() method to generate an ANOVA table for the model fit.
fm2 <- gls(log_Longevity ~ log_Mass + Family, data = M_pruned,
correlation = corBrownian(phy = tree_pruned),
method = "ML")
summary(fm2)
## Generalized least squares fit by maximum likelihood
## Model: log_Longevity ~ log_Mass + Family
## Data: M_pruned
## AIC BIC logLik
## -42.98216 -1.861866 38.49108
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 2.2415640 0.4388535 5.107773 0.0000
## log_Mass 0.0813252 0.0805706 1.009366 0.3164
## FamilyAtelidae -0.0417823 0.2677924 -0.156025 0.8765
## FamilyCebidae 0.0208861 0.2247004 0.092951 0.9262
## FamilyCercopithecidae -0.0421210 0.3993185 -0.105482 0.9163
## FamilyCheirogaleidae -0.1564374 0.4942202 -0.316534 0.7526
## FamilyDaubentoniidae -0.0562409 0.5223874 -0.107661 0.9146
## FamilyGalagidae -0.1670835 0.4956188 -0.337121 0.7371
## FamilyHominidae 0.1984529 0.4366134 0.454528 0.6509
## FamilyHylobatidae 0.1257937 0.4250077 0.295980 0.7682
## FamilyIndriidae -0.1376173 0.5231753 -0.263042 0.7933
## FamilyLemuridae -0.0382169 0.4965885 -0.076959 0.9389
## FamilyLepilemuridae -0.3838207 0.5073930 -0.756456 0.4520
## FamilyLorisidae -0.0683020 0.4859380 -0.140557 0.8886
## FamilyPitheciidae -0.0017273 0.3082873 -0.005603 0.9955
## FamilyTarsiidae -0.2023785 0.4886245 -0.414180 0.6801
##
## Correlation:
## (Intr) lg_Mss FmlyAt FmlyCb FmlyCr FmlyCh FmlyDb
## log_Mass -0.543
## FamilyAtelidae -0.221 -0.263
## FamilyCebidae -0.430 0.034 0.666
## FamilyCercopithecidae -0.432 -0.185 0.448 0.447
## FamilyCheirogaleidae -0.688 0.114 0.293 0.370 0.451
## FamilyDaubentoniidae -0.552 -0.073 0.325 0.344 0.460 0.667
## FamilyGalagidae -0.673 0.090 0.299 0.368 0.454 0.651 0.599
## FamilyHominidae -0.304 -0.336 0.454 0.403 0.808 0.394 0.433
## FamilyHylobatidae -0.409 -0.168 0.420 0.420 0.798 0.425 0.432
## FamilyIndriidae -0.541 -0.092 0.329 0.343 0.463 0.788 0.645
## FamilyLemuridae -0.586 -0.067 0.339 0.362 0.483 0.833 0.677
## FamilyLepilemuridae -0.623 0.025 0.308 0.357 0.455 0.842 0.656
## FamilyLorisidae -0.658 0.041 0.318 0.373 0.473 0.658 0.615
## FamilyPitheciidae -0.348 -0.006 0.519 0.586 0.387 0.311 0.295
## FamilyTarsiidae -0.670 0.148 0.288 0.375 0.450 0.540 0.484
## FmlyGl FmlyHm FmlyHy FmlyIn FmlyLm FmlyLp FmlyLr
## log_Mass
## FamilyAtelidae
## FamilyCebidae
## FamilyCercopithecidae
## FamilyCheirogaleidae
## FamilyDaubentoniidae
## FamilyGalagidae
## FamilyHominidae 0.401
## FamilyHylobatidae 0.427 0.855
## FamilyIndriidae 0.597 0.439 0.435
## FamilyLemuridae 0.631 0.453 0.453 0.826
## FamilyLepilemuridae 0.626 0.412 0.428 0.775 0.817
## FamilyLorisidae 0.849 0.426 0.445 0.613 0.647 0.637
## FamilyPitheciidae 0.311 0.355 0.364 0.295 0.311 0.304 0.317
## FamilyTarsiidae 0.535 0.387 0.424 0.481 0.511 0.514 0.538
## FmlyPt
## log_Mass
## FamilyAtelidae
## FamilyCebidae
## FamilyCercopithecidae
## FamilyCheirogaleidae
## FamilyDaubentoniidae
## FamilyGalagidae
## FamilyHominidae
## FamilyHylobatidae
## FamilyIndriidae
## FamilyLemuridae
## FamilyLepilemuridae
## FamilyLorisidae
## FamilyPitheciidae
## FamilyTarsiidae 0.315
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.948621709 -0.147266508 0.000108885 0.225109698 0.823807196
##
## Residual standard error: 0.3309798
## Degrees of freedom: 83 total; 67 residual
anova(fm2)
## Denom. DF: 67
## numDF F-value p-value
## (Intercept) 1 238.16637 <.0001
## log_Mass 1 2.97286 0.0893
## Family 14 0.20290 0.9990
Considering the ANOVA table, does the relationship between longevity and mass differ by family? Briefly explain.
No. The P-value for family is 0.999, so there is no difference in the relationship based on family. The intercepts are all equal to one another.
Extract (AIC()) and compare the AICs for the two models you just fit. Interpret the difference in AIC values.
AIC(fm1, fm2)
## df AIC
## fm1 3 -67.53574
## fm2 17 -42.98216
The difference is 25 AIC units, with the simpler model much preferred.
Feel free to explore other variables in this data set.